A hierarchical network heuristic for solving the orientation problem in genome assembly
نویسندگان
چکیده
In the past several years, the problem of genome assembly has received considerable attention from both biologists and computer scientists. An important component of current assembly methods is the scaffolding process. This process involves building ordered and oriented linear collections of contigs (continuous overlapping sequence reads) called scaffolds and relies on the use of mate pair data. A mate pair is a set of two reads that are sequenced from the ends of a single fragment of DNA, and therefore have opposite mutual orientations. When two reads of a mate-pair are placed into two different contigs, one can infer the mutual orientation of these contigs. While several orientation algorithms exist as part of assembly programs, all encounter challenges while solving the orientation problem due to errors from mis-assemblies in contigs or errors in read placements. In this paper we present an algorithm based on hierarchical clustering that independently solves the orientation problem and is robust to errors. We show that our algorithm can correctly solve the orientation problem for both faux (generated) assembly data and real assembly data for R. sphaeroides bacteria. We demonstrate that our algorithm is stable to both changes in the initial orientations as well as noise in the data, making it advantageous compared to traditional approaches. Author Summary Constructing an organism’s entire DNA sequence from raw genome sequencing data, like the data produced in the Human Genome Project, is a challenging task. The type of data generated in the sequencing process has changed substantially over the years as a result of various technological improvements. The computer programs that convert such data into assembled sequencing must continuously be revised to keep pace with the changing nature of the data. This paper builds upon current methods from the emerging field of network science to develop a new way of analyzing and correcting sequencing data. We show that our algorithm is both more robust to erroneous data, and more accurate overall, compared to current techniques.
منابع مشابه
Solving a multi-objective mixed-model assembly line balancing and sequencing problem
This research addresses the mixed-model assembly line (MMAL) by considering various constraints. In MMALs, several types of products which their similarity is so high are made on an assembly line. As a consequence, it is possible to assemble and make several types of products simultaneously without spending any additional time. The proposed multi-objective model considers the balancing and sequ...
متن کاملNetwork Algorithms for Complex Systems with Applications to Non-linear Oscillators and Genome Assembly
Title of dissertation: NETWORK ALGORITHMS FOR COMPLEX SYSTEMS WITH APPLICATIONS TO NON-LINEAR OSCILLATORS AND GENOME ASSEMBLY Karl R. B. Schmitt, Doctor of Philosophy, 2013 Dissertation directed by: Assistant Professor Michelle Girvan Department of Physics & Dr. Aleksey Zimin Institute for Physical Science and Technology Network and complex system models are useful for studying a wide range of ...
متن کاملSolving a Multi-Item Supply Chain Network Problem by Three Meta-heuristic Algorithms
The supply chain network design not only assists organizations production process (e.g.,plan, control and execute a product’s flow) but also ensure what is the growing need for companies in a longterm. This paper develops a three-echelon supply chain network problem including multiple plants, multiple distributors, and multiple retailers with amulti-mode demand satisfaction policy inside of pro...
متن کاملAn Analytical Approach for Single and Mixed-Model Assembly Line Rebalancing and Worker Assignment Problem
In this paper, an analytical approach is used for assembly line rebalancing and worker assignment for single and mixed-model assembly lines based on a heuristic-simulation algorithm. This approach helps to managers to select a better marketing strategy when different combinations of demands are suitable.Furthermore, they can use it as a guideline to know which worker assignment is better for ea...
متن کاملA novel bi-level stochastic programming model for supply chain network design with assembly line balancing under demand uncertainty
This paper investigates the integration of strategic and tactical decisions in the supply chain network design (SCND) considering assembly line balancing (ALB) under demand uncertainty. Due to the decentralized decisions, a novel bi-level stochastic programming (BLSP) model has been developed in which SCND problem has been considered in the upper-level model, while the lower-level model contain...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013